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Abstract 

We introduce a novel algorithm for decoding binary linear codes by linear programming. 
We build on the LP decoding algorithm of Feldman et al. and introduce a post-processing 
step that solves a second linear program that reweights the objective function based on 
the outcome of the original LP decoder output. Our analysis shows that for some LDPC 
ensembles we can improve the provable threshold guarantees compared to standard LP de- 
coding. We also show significant empirical performance gains for the reweighted LP decoding 
algorithm with very small additional computational complexity. 

1 Introduction 

Linear programming (LP) decoding for binary linear codes was introduced by Feldman, Karger 
and Wainwright [2j. The method is based on solving a linear-programming relaxation of the 
integer program corresponding to the maximum likehhood (ML) decoding problem. LP decoding 
is connected to message-passing decoding O [1], and graph covers E] and has received 
substantial recent attention (see e.g. p], and [7]). 

As with the work described here, a related line of work has studied various improvements to 
either standard iterative decoding [HI [9] or to LP decoding via nonlinear extensions [10] or loop 
corrections [11]. 

The practical performance of LP decoding is roughly comparable to min- sum decoding and 

slightly inferior to sum-product decoding. In contrast to message-passing decoding, however, the 
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LP decoder either concedes failure on a problem, or returns a codeword along with a guarantee 
that it is the ML codeword, thereby eliminating any undetected decoding errors. 

The main idea of this paper is to add a second LP as a post-processing step when original 
LP decoding fails and outputs a fractional pseudocodeword. We use the difference between the 
input channel likelihood and the pseudocodeword coordinate to find a measure of disagreement 
or unreliability for each bit. We subsequently use this unreliability to bias the objective function 
and re-run the LP with the reweighted objective function. The reweighting increases the cost 
of changing reliable bits and decreases the cost for unreliable bits. We present an analysis that 
the provable BSC recovery thresholds improve for certain families of LDPC codes. We stress 
that the actual thresholds, even for the original LP decoding algorithm, remain unknown. Our 
analysis only establishes that the obtainable lower bounds on the fraction of recoverable errors 
are improved compared to the corresponding bounds for LP decoding. It is possible, however, 
that this is just an artifact of the lower bound techniques and that the true threshold is identical 
for both algorithms. In any case, the empirical performance gains we observe in our preliminary 
experimental analysis seem quite substantial. 

A central idea in our analysis is a notion of robustness to changes in the BSC bit-flipping 
probability. This concept was inspired by a similar reweighted iterative ii minimization idea for 
compressive sensing |2H [20] . We note that the reweighting idea of this paper involves changing 
the objective function of the LP from the reweighted max- product algorithm |12j . 

2 Basic Definitions 

A vector x in is called /c-sparse if it has exactly k nonzero entries. The support set of a 
sparse vector x is the index set of its nonzero entries. If x is not sparse, the fc-support set of 
x is defined as the index set of the maximum k entries of x in magnitude. We use ||x||p to 
denote the norm of a vector x for p > 0. in particular ||x||o is defined to be the number of 
nonzero entries in x. For a set S, cardinality of S is denoted by IS*! and if S C {1, 2, • • • , n}, 
then li-s is the sub- vector formed by those entries of x indexed in S. Also the complement set of 
S is denoted by S"^. The rate of a linear binary code C is denoted by i?, and the corresponding 
parity check matrix is H & jpmxn^ where n is the length of each codeword and m = Rn. The 
factor graph corresponding to C is denoted by ^ = (Xt,,Xc,<?), where and Xc are the sets 
of variable nodes and check nodes respectively, and £ is the set of edges. For regular graphs, 
dy and dc denote the degree of variable and check nodes respectively. The girth of a graph Q, 
denoted by girth(^), is defined to be the size of the smallest cycle in Q. 
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3 Background 



Suppose that C is a memoryless channel with binary input and an output alphabet 3^, defined 
by the transition probabilities PY\xiy\^)- For a received symbol y, the likelihood ratio is defined 
as log( p^|^|^j|^^^| ), where x is the transmitted symbol. If a codeword x^"^) of length n from 
the linear code C is transmitted through the channel, and an output vector x^*") is received, 
a maximum likelihood decoder can be used to estimate the transmitted codeword by finding 
the most likely transmitted input codeword. Let 7^ be the likelihood ratio assigned to the i^^ 
received bit x^-'^^ and 7 be the likelihood vector 7 = (71, • • • ,7n)"^- The ML decoder can be 
formalized as follows [T] 

ML decoder: minimize 7^x 

subject to X e conv(C), (1) 

where conv(C) is the convex hull of all the codewords of C in M". The linear program ([T]) solves 
the ML decoding problem by the virtue of the fact that the objective 7^x is minimized by a 
corner point (or vertex) of conv(C), which is necessarily a codeword (In fact, vertices of conv(C) 
are all the codewords of C). In a linear program, the polytope over which the optimization 
is performed is described by linear inequalities describing the facets of the polytope. Since 
decoding for general linear codes is NP hard, it is unlikely that Conv(C) can be efficiently 
described. Feldman et al. introduced a relaxation of ([T]) by replacing the polytope conv(C) with 
a new polytope V that has much fewer facets, contains conv(C) and retains the codewords of C 
as its vertices [T|. One way to construct V is the following. If the parity check matrix of C is 
the m X n matrix H and if hj is the j-th row of H, then 

V = ni<j<mConv{Cj), (2) 

where Cj = {x G | hjx = mod 2}. As mentioned earlier, with this construction, all 
codewords of C are vertices of V. However, V has some additional vertices with fractional 
entries in [0, 1]*^. A vertex of the polytope V is called a pseudo-codeword. Moreover, if a pseudo- 
codeword is integral, i.e., if it has or 1 entries, then it is definitely a codeword. The LP 
relaxation of Q can thus be written as: 

LP decoder: minimize 7^x 

subject to X G 7^. (3) 
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The number of facets of V is exponential in the maximum weight of a row of H. Therefore, 
for LDPC codes with a smah (often constant) row density, V has a polynomial number of facets, 
and it is possible to solve ([s]) in polynomial time. 

For binary symmetric channels, Q has another useful interpretation. In this case, rather 
than minimize 7"^x it turns out that one can alternatively minimize the Hamming distance 
between the output of the channel x^*") and the individual codewords x E C Using the fact that 
the LP relaxation with V relaxes the entries of x from xi S {0, 1} to Xi G [0, 1], we may replace 
the Hamming distance with the ii distance ||x This implies that the decoder is 

equivalent to 



BSC-LP decoder: minimize ||x — x'-''^ ||i 

subject to X G P. (4) 

The above formulation can be interpreted as follows. For a received output binary vector x^''^, 
the solution to the LP decoder is basically the closest (in the ii distance sense) pseudo-codeword 
to xW. 

Linear programming decoding was first introduced by Feldman et al. [H [2] ■ Subsequently 
|13j it was shown that if the parity check matrix is chosen to be the adjacency matrix of a 
high-quality expander, LP decoding can correct a constant fraction of errors. A fundamental 
lemma in [2] and used in the results therein, is that the LP polytope V is the same polytope 
from the view point of every codeword, and therefore for the analysis of LP decoding, it can be 
assumed without loss of generality that the transmitted codeword is the all zero codeword. The 
theoretical results of [13] were based on a dual witness argument, i.e. a feasible set of variables 
that set the dual of LP equal to zero. However, the bounds on success threshold of LP decoding 
achieved by this technique is considerably smaller than the empirical recovery threshold of LP 
decoder in practice. A later analysis of LP decoding by Daskalakis et al. [14] improved upon 
those bounds for random expander codes, through employing a different dual witness argument, 
and considering a weak notion of LP success rather than the strong notion of [TH]. A strong 
threshold means that every set of errors of up to a certain size can be corrected, whereas a weak 
threshold implies that almost all error sets of a certain size are recoverable. Note that there is 
a gap of about one order of magnitude between the error-correcting thresholds of [2] and the 
ones observed in practice. 

The arguments of [13] and [H] are based on the existence of dual certificates that guarantee 
the success of the LP decoder and require codes that are based on bipartite expander graphs. A 
more recent work of Arora et al. uses a quite different certificate based on the primal LP problem 
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|15j . This approach results in fairly easier computations and significantly better thresholds for 
LP decoding. However, the underlying codes discussed in [TJ] are based on factor graphs with 
a large girth (at least doubly logarithmic in the number of variables), rather than unbalanced 
expanders considered in previous arguments. Note that similar to [T^, the bounds of [l_5j are 
weak bounds, certifying that for a random set of errors up to a fraction of bits, LP decoding 
succeeds with high probability. The largest such fraction is called the weak recovery threshold. 

A somewhat related problem to the LP decoding of linear codes is the compressed sensing 
(CS) problem. In CS an unknown real vector x of size n is to be recovered from a set of m 
linear measurement, represented by y = Ax, where A £ M*"^", and m « n. This is in general 
infeasible, since the measurement matrix A is under-determined and the resulting system of 
equations is ill-posed, i.e., it can have infinitely many solutions. However, imposing a sparsity 
condition on x can make the solution unique. The unique sparse solution can be found by 
exhaustive search for instance, which is formulated by the following minimization program: 

minimize ||x||o 

subject to Ax = y. (5) 

bmce 1^ is NP-hard, one possible approximation is relaxing the £q norm of x to the closet 
convex norm ||x||i, which results in the following ii minimization program: 



minimize ||x||i (6) 
subject to ^x = y. (7) 

([T]) is a linear program, which can in general be solved in polynomial time. There has been 
substantial theoretical work on this linear programming relaxation, see e.g. [18\ [T9\ [23} [Ml [26] 
Recently, systematic connections between the problems of channel coding LP and CS ii 
relaxation has been found |16[ |T7]. In this paper, we build on those connections to improve 
LP decoding, and further extend the ideas of robustness and reweighted li minimization in 
compressed sensing to channel coding LP. 

4 Extended Certificate and Robustness of LP decoder 

The success of LP decoder is often certified by the existence of a dual witness fl3|ll4j. Similarly, 
for ii minimization in the context of CS, a dual witness certificate can guarantee that the recovery 
of sparse signals is successful [22J. However, it has proven more promising to express the success 
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condition of £i minimization in terms of the properties of the nuh space of the measurement 
matrix j23| [2^ I25| . The condition is called null space property, through which it is possible 
to characterize one class of "good" measurement matrices for CS, namely matrices that are 
congruent with ii minimization decoding. The advantage of the null space interpretation, apart 
from the fact that it results in sharper analytical bounds, is that with proper parametrization, 
it can also be used to evaluate the performance of £i minimization in the presence of noise. This 
is known as the robustness of ii minimization. A consequence of the robustness property is that 
when ii minimization fails to recover a sparse signal, it often gives a decent approximation to it 
|20j . To the best of our knowledge, a similar certificate has not been introduced in the context 
of channel coding linear programming. In other words, when LP decoding fails to return an 
integral solution, it is not known how far in the proximity of the actual codeword it lies. We 
provide an approximate solution to this question in this section, using the following strategy. 
We introduce a property called fundamental cone property for an arbitrary code C, and show 
that for binary symmetric channels, this is related to the robustness of the solution of the LP 
decoder. The robustness of LP decoding has two consequences. First, it implies that the linear 
program is tolerant to a limited mismatch in the available formulation. Second, it can be used 
to develop iterative schemes that improve the performance of the decoder. We will discuss these 
issues in proceeding sections. We begin by defining the fundamental cone of a code from |16j . 

Definition 1. Let H be a parity check matrix. Define J and I to be the set of rows and columns 
of H . Also, for each j G J, define Ij = {i £ I \ H[j,i) = 0}. The fundamental cone, 1C{H), of 
H is the set of all vectors u = (^^1,^2, . . . that satisfy 



JC{H) is the smallest cone in that encompasses the polytope V. If a vector lies on an edge 
of /C, it is called a minimal pseudo-codeword. For simplicity, in the sequel, we use /C instead of 
IC{H) whenever there is no ambiguity. 

Definition 2. Let S C {1, 2, • • • ,n} and C > 1 be fixed. A code C with parity check matrix H 
is said to have the fundamental cone property FCP{S, C), if for every nonzero vector to G K,{H) 
the following holds: 




> 



VI < i < n 



(8) 
(9) 



(10) 
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if for every index set S of size k, C has the FCP{S, C), then we say thatC has the fundamental 
cone property FCP{k, C). 

In the next lemma we show how the fundamental cone property can be used to evaluate 
the performance of an LP decoder, even when it fails to recover the true codeword. The key 
assumption is that the channel is a bit flipping channel {e.g. BSC). 

Lemma 4.1. Let C he a code that has the FCP{S,C) for some index set S and some C > 1. 
Suppose that a codeword x.^'^^ fromC is transmitted through a bit flipping channel, and the received 
codeword is x^'") . // the pseudocodeword x(P) is the output of LP decoder for the received codeword 
x^^^, then the following holds: 

||x(P) -x(^)||i < 2^^||(xM -x(^))sc||i. (11) 

Proof. Without loss of generality, we may assume that the all zero codeword was transmitted, 
i.e. x^'^) = 0. We have 

II (r) (r)|| II (r)|| 

||xy 111 + ||x^c 111 = ||x^ ^||i 



(a) 



> ||x(P)-xW| 



= ||(x(P)-xM)s||i + ||(x(P)-xM)5c||i 

> ||xj-)||i - llxg'^lli + llxjfilli - llxjrilli. (12) 

(a) is true because from Q, ||x(^) — x*^'')||i < Hx^^^^ — x^^^Hi. Also (b) holds by the triangular 
inequality. Note that x^^^ G 1C{H), so by definition, C||x^^||i < ||x^i||i. This implies that 

l|x§'i||i-||x?lli>§^M^^IIi- (13) 



Applying this to the left hand side of (12) we obtain 



Which is the desired result. 



An asymptotic case of Lemma 4.1 for C — t- 1 is in fact equivalent to the LP success condition. 



Namely, let S be the index set of the flipped bits in the transmitted codeword, i.e. the set of 



bits that differ in x'*"^ and x(^). If FCP(5,C) holds for some C > 1, then Lemma 4.1 implies 



that LP decoding can successfully recover the original codeword. Now let us say that the set of 



7 



errors (flipped bits) is slightly larger than S, and does include S. Then the vector (x*^*") —x^'^^)s<: 
has a few (but not too many) nonzero entries. Therefore, even if the LP decoder output x(P) is 
not equal to the actual codeword, it is still possible to obtain an upper bound on its £i distance 
to the unknown codeword. We recognize this as the robustness of LP decoder, and characterize 
it by FCP(5', C), for C > 1. Furthermore, two notions of robustness can be considered. Strong 
robustness means that for every set S of up to some cardinality k, the FCP condition holds, 
namely FCP(A;,S'). Weak robustness on the other hand deals with almost all sets S of up to 
a certain size. In the next section we present a thorough analysis of LP robustness for two 
categories of codes: expander codes and codes with $7(loglogn) girth. For these two classes 
of codes, rigorous analysis has been done on the performance of LP decoders in |13| [T^ and 
|15j . respectively. We build on the existing arguments to incorporate the robustness condition 
and analyze the fundamental cone property. Afterwards, we discuss the implications of LP 
robustness. 

5 Analysis of LP Robustness 

In most cases, if there exists a certificate for the success of LP decoder, it can be often extended 
to guarantee that the LP decoder is robust, namely that the FCP condition is satisfied for some 
C > 1. By carefully re-examining the analysis of LP decoder, one might be able to do such 
a generalization. This is the main focus of this section. We consider three major methods 
that exist in the literature for analyzing the performance of LP decoders. The first one is 
due to Feldman et. al |13j . and is based on using a dual witness type of argument to certify 
the success of LP decoder for expander graphs. The second one is that of Daskalakis et al. 
|14j . which again considers linear programming decoding in expander codes. Specifically, |14j 
analyzes the dual of LP and finds a simple combinatorial condition for the dual value to be 
zero (implying that the LP decoder is successful). The condition is basically the existence of 
a so-called hyperflow from the set of flipped bits to unflipped bits. The existence of a valid 
hyperflow can be secured by the presence of so-called (p, g)-matchings. It then follows from 
a detailed series of probabilistic calculations that {p, q')-matchings of interest exist for certain 
expander codes. The main difference between this analysis and that of Feldman et al. is the 
probabilistic nature of the arguments in p3], which account for weak recovery thresholds. 

A third analysis of the LP decoder was done by Arora et al., [15j, which is based on factor 
graphs with a doubly logarithmic girth. Unlike previous dual feasibility arguments, the authors 
in [15] introduce a certificate in the primal domain, which is of the following form: If in the 
primal LP problem, the value of the objective function for the original codeword is smaller than 
its value for all vectors within a local deviation from the original codeword, then LP decoder 
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succeeds. Local deviations are defined by weighted minimal local trees whose induced subgraphs 
are cycle- free. 



5.1 Strong LP Robustness for Expander Codes 

Strong thresholds of LP decoding for expander codes are derived in |13| . To show that the 
transmitted codeword is the LP optimal obtained by ([s]) when a subset of the bits are flipped, a 
set of feasible dual variables are found that satisfy the following conditions. Suppose the factor 
graph of C is denoted by ^ = {X^, Xc, £)■ We may also assume without loss of generality that 
the all zero codeword was transmitted. A set of feasible dual variables is defined as follows (see 
[13] for more details) 



Definition 3. For an error set S, a set of feasible dual variables is a labeling of the edges of 
the factor graph Q, say {rij \ Vi € X^ Cj £ Xc}, where the following two conditions are satisfied: 

i) For every check node Cj G Xc and every two disjoint neighbors of Cj, say Vi,Vi' G X(j), we 
have Tij + Ti'j > 0. 

a) For every variable node Vi G X^, we have 'Ylic-<^N(v-) ''"u — 7*- 

We show that a generalized set of dual feasible variables can be used to derive LP robustness. 
To this end, we show that the existence of a set of feasible dual variables implies the FCP 
condition. The following lemma is proved in Appendix \K\ 

Lemma 5.1. Suppose that a set of dual variables satisfy the feasibility conditions (Definition 
for an arbitrary log-likelihood vector 7. Then for every vector oj € /C(C), the following holds 

l<j<n 



A special case of Lemma 5.1 is when the channel is a BSC, and a set S of the bits have 



been flipped. We can also assume without loss of generality that the all zero codeword was 



transmitted. Then Lemmas 4.1 and |5.1| imply that if a dual feasible set exists, then LP decoder 
succeeds, which is the conclusion of [13]. In this case the log-likelihood vector 7 takes the value 
—1 over the set S and 1 over the set S"^. Let us now define a new likelihood vector 7' by 

, , -C i£S 

for some C > 1. If a dual feasible set exists that satisfies the feasibility condition for 7', then it 
follows that FCP(S, C) holds. Knowing this and pursuing an argument very similar to [13] for 
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the construction of dual feasible in expander codes, we are able to prove the following lemma, 
the proof of which is given in Appendix |B] 

Theorem 5.1. Let Q he the factor graph of a code C of length n and rate R = and let 
6 > 2/3 + 1/dy. If Q is a bipartite {an,6dy) expander graph, then C has FCP{t,C), where 
t = 2S-i '^ '^^^ ^ ~ 2S-i-i/d • ^'^^'^ means that for every every set S of size t, FCP{t,C) holds. 



Basically, [13j shows that if the conditions of Theorem 5.1 are satisfied, then LP succeeds 
for every error set of size t, namely that FCP(t, 1) holds. However Theorem 5.1 asserts that, in 
addition, a strong robustness holds, i.e. FCP(t, C) for some C > 1. 

5.2 Weak LP Robustness for Expander Codes 

We show that for random expander codes a probabilistic analysis similar to the dual witness 
analysis of [H] can be used to find the extents of the fundamental cone property for expander 
codes, in a weak sense. We rely on the matching arguments of [13], with appropriate adjustments. 
The following definition is given in 



Definition 4. For nonnegative integers p and q, and a set F of variable nodes, a {p, q) -matching 
on F is defined by the following conditions: 

(a) each bit Vi £ F must be matched with p distinct check nodes, and 

(h) each variable node Vii £ F^ must be connected with 

Xii := max{g — d^ + -^j', 0} (17) 

checks nodes from the set N(F), that are different from the check nodes that the nodes in 
F are matched to, where Zi' is defined as Zii := |A^(i') n A^(F)|. 

We prove the following lemma that relates the existence of a (p, (7)-matching to the funda- 
mental cone property of a code C This lemma is proved in Appendix [Cj 

Lemma 5.2. Let C he a code of rate R with a bipartite factor graph Q , where every variable 
node has degree d^. Let S be a subset of the variable nodes of Q . If a {p, q) -matching on S exists, 
then C has the FCP{S, ^^). 

|14j provides a probabilistic tool for the existence of {p, g)-matchings in regular bipartite 
expander graphs, which helps answer the question of how large an error set LP decoding can 
fix. For example, for a random LDPC(8,16) code, the probabilistic analysis implies that with 
high probability, a fraction 0.002 of errors is recoverable using LP decoder. However, taking the 
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specifications of the matching that leads to this conclusion and applying Lemma 5.2, it turns 
out that for an error set of size 0.002n, the robustness factor is at least C = 1.3, i.e the code 
has FCP(0.002n, 1.3). 



5.3 Weak LP Robustness for Codes with VL{\og\og{n)) Girth 

Recall that Q = {X^^XctS) is used to denote the factor graph of the parity check matrix H 
(or of code C), where X^ and Xc are the sets of variable and check nodes respectively and 8 is 
the set of edges. Also recall that the girth of Q is defined as the size of the shortest cycle in Q. 
Without loss of generality, we assume that Xy = f2, • • • , fn}i where Vi is the variable node 
corresponding to the i^^ bit of the codeword. Let T < |girth(^) be fixed. The following notions 
are defined in [T5] . 

Definition 5. A tree T of height 2T is called a skinny subtree of Q, if it is rooted at some 
variable node Vi^ , for every variable node v in T all the neighboring check nodes of v in Q are 
also present in T, and for every check node c in T exactly two neighboring variable nodes of c 
in Q are present in T ■ 

Definition 6. Let w G [0, 1]'^ be a fixed vector. A vector 13^^'^ is called a minimal T-local 
deviation, if there is a skinny subtree of Q of height 2T, say T, so that for every variable node 
Vi 1 < i < n, 

I otherwise 

where hi = ^d{vig,Vi). 

The key to the derivations of [TS] is the following lemma: 

Lemma 5.3 (Lemma 1 of |15)). For any vector z £ V, and any positive vector w £ [0, 1]"^, there 
exists a distribution on the minimal T-local deviations f3^^\ such that 

E/3(^) = az, 

where < a < 1. 

Lemma |5.3| has the following interpretation. If a linear property holds for all minimal T-local 
deviations {e.g. /(/3^^^) > 0, where /(.) is a linear operator), then it also holds for all pseudo- 
codewords (i.e. /(z) > Vz € "P). Interestingly enough, the robustness of LP decoding for a 
given set of bit flips S has a linear certificate, namely FCP(S', In other words, if we define: 
^Note that this is only true for bit fiiping channels, where the output alphabet in the binary field. 
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then FCP(5, C) holds, if and only if ff'{z) > for every pseudocodeword z G P. Therefore, 
according to Lemma 5.3, it suffices that the condition be true for all T-local deviations. Fur- 
thermore, for arbitrary C > 1, if /^^^(/S^^-*) > for all minimal T-local deviations f3^^\ then it 
follows that the code has the FCP(S', C) property. This simple observation helps us extend the 
probabilistic analysis of [T5] to robustness results for LP decoding. The resulting key theorem is 
mentioned below, the proof of which can be found in Appendix|Dj In order to state the theorem, 
first we define r/ to be a random variable that takes the value —C with probability p and value 
1 with probability 1 — p. Also, define the sequences of random variables Xi,Yi, i > 0, in the 
following way: 



Yo = V, 

Xi = min{y/'\...,y/^=-')} Vi>0, 

Yi = 2S + Xfll + • • • + Xf^r^^ Vi>0, 

(18) 

Where X^^^s are independent copies of a random variable X. 

Theorem 5.2. Let < p < 1/2 be the probability of bit flip, and S be the random set of flipped 
bits. If for some j £ N, 
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i/K-2) jjiinEe"*^^ < 1, 



T-l 



where-/ = (4-l)%^(3^)^/('^«+^Hl-p) < 1, Then with probability at leastl-0{n)c'^^^'^^-^'^ 
the code C has the FCP{S, C), where T is any integer with j <T < l/Agirth{Q). 

For dc = 6 and dt, = 3, a lower bound on the robustness parameter C that results from 
Theorem 5.2 is plotted against the probability of bit flip p, in Figure [T] 



6 Implications of LP robustness 
6.1 Mismatch Tolerance 

One of the direct consequences of the robustness of LP decoding is that if there is a slight mis- 
match in the implementation of the LP decoder, its performance does not degrade significantly. 
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More formally, suppose that due to noise, quantization or some other factor, a mismatched log- 
likelihood vector 7' = 7 + A7 is used in the LP implementation. We refer to such a decoder as a 
mismatched LP decoder. Since the channel is BSC, the entries of 7 all have the same amplitude 
g. We also define 6 = maxj |A7j|, and assume that 6 < g. We can prove the following theorem. 

Theorem 6.1. Suppose that S is the set of bit errors. LetC = IfC has FCP{S,C), then 
the mismatched LP decoder corrects all errors and recovers the original codeword. 

Proof. We assume without loss of generality that the all zero codeword is transmitted. We show 
that if FCP(S', C) holds, then the all zero codeword is the minimum cost vector in the polytope 
V. Suppose uj is a nonzero vector in the fundamental code /C. We begin with the definition of 
FCP(5,C) and write 

-C^uji+Y^oji>0. (19) 

Multiply both sides by {g — 5): 



-Y^{g + 5)u;, + Y,i9-5)uj,>0. (20) 

We also know from the definition of 6 that 7^' > {g — 6) for i G S^, and 7^' > —g — 5 for i £ S'^, 
and that a; > 0. Therefore 

- 7^c^*>0' (21) 

i£SUS'= 

which proves that the all zero codeword is the unique minimum cost solution of the mismatched 
LP. 



6.2 Pseudocodewords and High Error Rate Subsets 

We showed in Section |4] that for an appropriate code C, even when LP decoder fails to recover an 
actual codeword from the output of a BSC, the ii distance between the obtained pseudo codeword 



and the actual codeword can be bounded by a finite factor of excess errors (see equation 11). 
We now show that this property allows us to use the output of LP decoder to find a high error 
rate subset of the bits of linear size, namely a subset of bits over which the fraction of errors is 
significantly larger than the fraction of errors in the entire received codeword. Obtaining such 
importance subset is very crucial, since it provides additional information about a significant 
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Fi gurc 1 ! Approximate upper bound for the robustness factor C as a function of error probability p for dc = Q and 
du = 3, based on Theorem|5.2| 



proportion of the bits which can be used to improve the decoder's performance. For instance, one 
can impose additional soft or hard constraints on the importance subset, and solve a constrained 
linear program or other post processing algorithms following the initial linear program. This 
forms the idea for the proposed iterative LP decoding algorithm which will be outlined in Section 

m 

Consider a code C of length n and rate R, and a codeword x^*^) from C transmitted through a 
bit flipping channel. Suppose that a set K of the bits get flipped, where the cardinality of K is 
(1 + p*)en for some < p* < 1 and e > 0. Denote the received vector by x^''^ We are interested 
in the case where LP fails, so the LP minimal x^^^ is a fractional pseudocodeword. However, 
the size of the error set is only slightly larger than the correctable size p*n. In other words, we 
assume that for some subset Ki C K of size p*n, the code has FCP(-fri, C), for some C > 1. 
We show in the next lemma that the index set of the largest k entries of the vector 
has a significant overlap with K with high probability, and is thus a high error rate subset of 
entries. The following theorem formalized this claim. 

Theorem 6.2. Suppose that a codeword x^^^ is transmitted through a bit flipping channel, and 
the output x^*") differs from the input in a set K of the hits with \K\ = + e)n, for some 
< p* < 1 and e > 0. Also, suppose that for a subset Ki d K of size p*n, FCP{Ki, C) holds, 
for some C > 1, and that the LP minimal is the pseudocodeword x^'^\ If L is the set of the 
+ e)n largest entries of the vector x^^'^ — x^^^ in magnitude, then the fraction of errors in 
x^^) over the set L is at least 1 — 2^7z^e. 

Before proving this theorem, we state the following definition and lemma. 

Definition 7. Let x G M" be a k-sparse vector. For A > 0, We define W{x, A) to be the size of 
the largest subset of nonzero entries of x that has a i\ norm less than or equal to A, i.e., 

W{x, A) := max{\S\ \ S C supp{x), \\xs\\i < A}. (22) 
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The following Lemma is proven in |20| . 

Lemma 6.1 (Lemma 1 of |20j). Let x be a k-sparse vector and x be another vector. Also, let 
K be the support set ofx and L be the k-support set ofx, namely the set of k largest entries of 
X. If d = ||x — x||i, then 

\Kr\L\>k-W{x,d). (23) 



Proof of Theorem 6.2 Define k = p*{l + e)n, and apply Lemma 6.1 to the fe-sparse vector 
x^*") — x^'^) , and the vector x^^'^ — x^'') . If L is the index set of the largest k entries of x'^^^ — x^'') 



in magnitude, then from Lemma 6.1 we have 



\KnL\ > /fc- Vr(x('') -x^'^), A), 



(24) 



where A = ||x(^) — x*^^')||i. Since ||x('') — x'^'^^H has only ±1 nonzero entries, (24) can be written 
as 



\KnL\ > k 



X 



x(P)||i. 



(25) 



We use the inequality in (11) to further lower bound the right hand side of ( 25 ) . Recall that 



Ki G K is such that C has FCP{Ki, C). Therefore, we can write: 



\KnL\ > A:-2^±j||(xM-x('=))^c||i (26) 
= k-2^^{k-p*n). (27) 

Dividing both sides by \K\ = k, we conclude that at least a fraction 1 — 203je of the set L 
are flipped bits. ■ 



7 Iterative Reweighted LP Algorithm and Improved Strong Thresh- 
old 

First, we briefly define different recovery thresholds for LP decoding for more clarity of the 
statements that will follow. In general, the actual weak and strong thresholds for a given classes 
of linear codes might be unknown, and the existing threshold only provide lower bounds on these 
quantities. For expander codes for instance, the size of the error set that can be recovered via 
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LP can be lower bounded by the size of the set for which a dual witness exists \13\ [T3] • Since a 
dual witness is only a sufficient condition for the success of LP decoding, the actual thresholds 
are generally expected to be higher. However, to date, the best achievable thresholds for LP 
decoding for expander codes are those given by the dual feasibility arguments. Therefore, we also 
consider thresholds associated with those limits, namely the "provable" thresholds. Specifically, 
we define the following four thresholds for LP decoding on a given code C that has regular 
variable and check degrees dy and dc- 

Definition 8 (Recovery thresholds). Strong recovery threshold is denoted hy p*, and is defined 
as the largest fraction such that every set of size p*n is recoverable via LP decoding. Weak 
recovery thresholds is denoted by p^, and it means that almost all sets of size p^^n is recoverable 
via LP. We define p*^ to be the maximum provable strong threshold achieved by a dual feasible, 
IT^ . Similarly, p*^^ is the provable weak threshold, i.e. for almost all sets of size p^j^n, a dual 
feasible (fT4^ ) exist. 



As sketched in Theorem 6.2 , by examining the deviation of the LP optimal (pseudo-codeword) 
and the received vector, it is possible to identify a high error rate (HER) subset of bits in which 
the fraction of bit flips is higher than the overall probability of error, or the fraction of errors 
in the complement of the HER set. One way this imbalancedness can be exploited is by using 
a weighted LP scheme. This is outlined in the following iterative algorithm. 

Algorithm 1. 

1. Run LP decoding. If the output is integral terminate, otherwise proceed. 

2. Take the fractional pseudocodeword x^^^ from the LP decoder, and construct the deviation 
vector x*^'^) = x^'^^ — x^^). 

3. Sort the entries of x^^^ in terms of absolute value, and denote by L the index set of its 
smallest pn entries. 

4- solve the following weighted LP: 

minAilKx - y)L\\i + AalKx - y)L4i, (28) 

where Ai and A2, where Ai < ad A2 > are fixed parameters. 

Algorithm [T] is only twice as complex as LP decoding. We prove in the following that 
algorithm [T] has a strictly improved provable strong and weak recovery thresholds than the dual 
feasibility thresholds p*^ and (Recall the definitions of p*^ and p*^ from Definition [s]) . 
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Theorem 7.1. For any code C, there exist ei > 0,e2 > 0, Ai < and A2 > so that every error 
set of size {\ + ei)p*^, and almost all error sets of size (l + e2)p^rf can he corrected by Algorithm 

m 

we start with the following lemma 

Lemma 7.1. Suppose a codewords x transmitted is through a binary channel. Also suppose that 
the bits ofx can be divided into two sets L and L^, so that at least a fraction pi of the bits in L 
are flipped, and at most a fraction p2 of the bits in are flipped. Then the following weighted 
LP decoding 

■ "(x-y)L||i + ||(x-y)L.||i, (29) 



mm 



can recover x, provided that 



{l-pi)\L\+p2\m<p;^. 



(30) 



Proof. We assume without loss of generality that the all zero codeword has been transmitted 
and prove that there exists a feasible dual (Definition [3] ) for the LP decoder 29 The feasible 
dual must satisfy condition (i) of Definition [s] for all check nodes, and in addition: 



1 ie Lns 
-1 i£ LnS" 
-1 i£ L^ns 
1 ie L^nS" 



(31) 



One can note that the conditions of (31) are equivalent to Tjj's being a feasible dual set for 



ordinary LP decoder when the error set is Si = {LO S'^) U {L'^ D S). Therefore if the size of Si 
is smaller than p*g^n, from the definition of p*^, such a feasible dual set exists. This completes 
the proof the theorem. ■ 



proof of Theorem 7.1. We set Ai = — 1 and A2 = 1. Suppose the all zero codeword have been 
transmitted without loss of generality, and the received binary vector x^^^ has pn errors, where 



p = (1 + eo)p*^. From Theorem 5.1, C has FCP(p*n, C) for some C > 1. Therefore, if we apply 



Theorem 6.2 to the output of LP, namely x^^^, we conclude that the set L of most pn deviated 
bits in x^P) with respect to x^'"), and the set S of the errors in x^^'^, have at least a fraction 



1 ~ 203^61 overlap. Define pi 



\Lns\ 
\L\ 



and p2 



Pi 

pi\L\ +P2\L''\ 



\L'=ns\ 

> 1 - 
= P- 



. We must have 



^— — reo. 



C- 1 



(32) 
(33) 
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Therefore, as eo — 0, — t- 1 and p2 — >• 0. So, for some small enough eo, the following will 
eventually hold 



il-pi)\L\+p2\L'^\<pU. (34) 

Thus, according to Lemma 7.1 , the weighted LP step of Algorithm [T] corrects all errors, similarly, 
if a random set of pn bits are flipped, when p = (1 + £2)^*^^, from Lemma 5.2 we conclude that 
with high probability there exists a C > 1 so that FCP(S'i, C) holds for a random subset Si of 



the bit errors of size p^^^ra. Therefore, using Theorem 6.2, it follows that the set L of most pn 
deviated bits in x^^^ with respect to x^^'^, and the set of errors in x^'") have at least an overlap 



fraction of 1 — 2^T^e2- The remainder of the proof is the same as the previous case, i.e. by 



applying Lemma 7.1 



8 Simulations 

We have implemented Algorithm [T] on a random LPDC code of size n = 1000 and rate i? = 3/4 
and have compared the results with other existing methods. The variable node degree is = 3, 
and thus, dc = 4. The algorithm is compared with the mixed integer method of Draper and 
Yedidia [27], and the random facet guessing algorithm of [28]. The mixed integer algorithm 
re-runs the LP decoding by setting integer constraints on a small subset of "least certain" bits, 
namely the positions where the LP minimal pseudocodeword entries are closest to 0.5. We 
have taken the size of the constrained subset to be M = 5, which means the number of extra 
iterations is 32 for the mixed integer method. We also choose to run 20 more extra random 
iterations for facet guessing. In random facet guessing, a face (facet) of the polytope V is 
selected at random, among all the faces on which the LP minimal pseudocodeword does not 
reside. Then, LP decoder is re-run with the additional constraint that the solution is on the 
selected face. In contrast. Algorithm [l] has only one extra iteration. All methods are simulated 
in MATLAB where LP decoder is implemented via the cvx toolbox [29j. We have plotted the 
BER curves versus the probability of error p in Figure [2} For Algorithm [T| for each p, we have 
experimentally found the optimal Ai and A2 by choosing the values that on average result in the 
best performance. For most of the cases the chosen values where in the ranges —3 < Ai < —0.5 
and 1 < A2 < 3. Observe the superior BER performance of Algorithm [l] which becomes more 
significant for smaller values of p. For p = 0.11, the BER improvement in the reweighted LP 
method is at least one order of magnitude. In our preliminary experimental evaluation we 
observe that the BER curves eventually collapse into the same curve as the LP curve, except for 
the reweighted LP algorithm, which is an indication of the fact that the empirical thresholds of 
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Figure 2: ber curves as a function of channel flip probability p, for LP decoding and different iterative schemes; random 
facet guessing of 28 , mixed integer method of [27| . and the suggested iterative reweighted LP of Algorithm [l] The code is 
a random LDPC(3,4) of length n = 1000. 

Algorithm [T] are better than those of LP decoder and existing polynomial time post processing 
methods. 
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Proof of Lemma 5.1 



We first prove the following lemma. 

Lemma A.l. Suppose {tij \ 1 < i < n,l < j < m} is a set of feasible dual variables on the 
edges of the factor graph Q of the code C, for some arbitrary log-likelihood vector 7. Then for 
every vector w G /C(C) and every check node cj, the following holds 

wiTij > 0. (35) 

VieN{cj) 

Proof. We only use condition (i) of a feasible set of dual variables. Note that among the variable 
nodes in N{cj), there can be at most one node Vi with Tij < 0. Let Vi be such a variable node. 
From the definition of /C we can write 



Wi< ^ Wi', 

i'eAf(i)V 



or equivalently: 



TijWi + ^ \Tij\Wi/ > 0. 



(36) 
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Moreover, we know that r,j + Ti'j > for i' 7^ i, from the condition (i) of the dual feasibility. 



Therefore, replacing Tj/j with |r,j| for each i' ^ i does not decrease the left hand side of (36) 
and thus 

E 



WiTij > 0. 



We now invoke Lemma 



A.l 



that for every check node Cj, ^y-QN(^cj) ''^i'^ij ^ 0- If sum 
these inequalities for all check nodes cj we obtain: 



When Xy and Xc are the sets of variable and check nodes respectively. Since TijS are feasible 
dual variables, from condition (ii) of feasibility (Definition jsj) , we must have '^Cj&N{vi) '^ij < 7«- 
It then follows that 



liWi > 0. 



B Proof of Theorem 5.1 



We basically repeat the argument of [l3] with some slight adjustments. Let S be the set of 
flipped bits, or interchangeably the set of corresponding variable nodes in the factor graph Q 
(we use Vi to refer to the variable node corresponding to the i^^ bit). 

Definition 9 {{5, A) matching from [TB]). A (6, A) matching of the set S is a set M of edges of 
the factor graph Q, so that no two edges are connected to the same check node, every node in S 
is connected to at least 6dy edges of M , and every node in S' is connected to at least \dy edges 
of M . Here S' is the set of variable nodes that are connected to at least (1 — A) check nodes in 
N{S). 

If there is a {5, A) matching on the set S, then we consider the following labeling of the edges 
of Q. For a check node vj, if it is adjacent to an edge Tij is M then set Tij = —x and Tj/^ = x 
for every other variable node f ■ G X{vj) i! 7^ i. Otherwise, label all of the edges of the edges 
adjacent to j by 0. It can be seen that this for this labeling {rjj} satisfies condition (i) of dual 
feasibility (Definition [3]) , and furthermore: 



(1 - 25)dvx i e S 
(1 - X)dyx i G S" 



(37) 
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We know take \ = 2 — 26 + \/dv Let us define a new likelihood vector 7' by 




(38) 

If a dual feasible set exists that satisfies the feasibility condition for the vector 7', then this 
implies that the FCP(S', C) holds. Now, since C < ^^fy , if we choose x to be 

(39) 



(1-A)4^ 



then, it is clear that (1 — 26)d^x < —C. So the dual feasibility condition is satisfied, if we can 
construct the required {6, X) matching for S. From [13j, if l^l < |f5fo, and ^ is a bipartite 
{an,5dv) expander, the desired matching exists. This proves that FCP(5, C) holds. Since this 
argument holds for every set S of size t = ^E^a, we conclude that C has FCP(t, C). 



C Proof of lemma 5.2 



Consider a vector uj in the fundamental cone /C = fC{H) of the parity check matrix H. Without 
loss of generality, we may assume that S = {1, 2, • • • ,t}. For each 1 < i < t, let the neighbors of 
the variable node Vi in the (p, g)-matching on 5 be denoted hy c\,c\, - ■ ■ , c^. The check nodes 
c*- are p x t distinct nodes. From the definition of /C, if a; S /C, then for each c*- we may write: 

< ^ uji, <i <t 1 < j <p. (40) 



We add all inequalities of (40) for I < i < t and I < j < p- For i <t, Ui appears exactly p times 
on the left hand side of the sum and, at most — p times on the right. For i > t, Ui appears in 
at most dy — q inequalities and on the right hand side. This comes directly from the definition 
of a {p, g)-matching on the set S. Therefore 

p'^uji < {dy - p)'^uji + {dy - q)'^u;i, (41) 

ies ies i&S'^ 



and thus, 

2p — djj 



dv — Q 

^ ^ i€S ieS'' 

which proves that C has the desired fundamental cone property. 



^a;i<J^t^i, (42) 
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D Proof of Theorem 



5.2 



We denote the set of variable nodes and check nodes by and Xc respectively. For a fixed 
w € [0, 1]"^, let B be the set of all minimal T-local deviations, and Bi be the set of minimal 
T- local deviations that result from a skinny tree rooted at the variable node Vi. Also, assume S 
is the random set of flipped bits, when the flip probability is p. Interchangeably, we also use S 
to refer to the set of variable nodes corresponding to the flipped bits indices. We are interested 
in the probability that for all /3(^) G B, /^^^(/S^^^^) > 0. Recafl that 



i&S'^ i€S 



(S) 

For simplicity we denote this event by {B) > 0. Since the bits are flipped independently 
and with the same probability, we have the following union bound 



fh \^) > > l-n¥[f}^\Bi) > . (43) 



Now consider the full tree of height 2T, that is rooted at the node fi, and contains every node 
u va. Q that is no more than 2T distant from i.e. d{vi^u) < 2T. We denote this tree by 
B{vi,2T). To every variable node u of B{vi,2T), we assign a label, I{u), which is equal to 
—CbJhi^u) if G S, and is oo^t^u) if G 5"^^, where (wo,W2, • • • ■,^2T-2) = w. We can now see that 
the event {Bi) > is equivalent to the event that for all skinny subtrees T of B{vi, 2T) of 

height 2T, the sum of the labels on the variable nodes of T is positive. In other words, if Fi is 

(s) 

the set of all skinny trees of height 2T that are rooted at vi, then (^i) > is equivalent to: 



E ^W>0- (44) 



mm 

We assign to each node u (either check or variable node) of B{y\, 2T) a random variable Z^, 
which is equal to the contribution to the quantity minT-gPi J2v€TnXy -^(^) offspring of 

the node u in the tree B{vi, 2T), and the node u itself. The value of for can be determined 
recursively from all of its children. Furthermore, the distribution of Zu only depends on the 
height of u in B{vi, 2T). Therefore, to find the distribution of Z^, we use Xq, Xi, - ■ ■ , Xt-i as 
random variables with the same distribution as Z„ when u is a variable node {Xq is assigned to 
the lowest level variable node) and likewise Yi, ■ " " i ^-i for the check nodes. It then follows 
that: 
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Yq = uor], 

Yi = a;^?? + + • • • + X^r') Vf > 0, 



(45) 



where X^^^s are independent copies of a random variable X, and r/ is a random variable that 
takes the value —C with probability p and value 1 with probability 1 — p. It follows that 



p (/^^^ (fii) < o) = p + • • • + x^^tl < o) 

< (E(e"*^^-i))'^". (46) 

The last inequality is by Markov inequality and is true for all t > 0. The rest of the proof 
we bring here is basically appropriate modifications of the derivations of |15) for the Laplace 
transform evolution of the variables XiS and l^s, to account for a non-unitary robustness factor 
C. By upper bounding the Laplace transform of the variables recursively it is possible to show 
that (see Lemma 8 of [15], the argument is completely the same for our case) 



Yl ((4 - l)Ee- 

0<fc<i-i-l 



(47) 



for all 1 < j <i <T. 

If we take the weight vector as a; = (1, 2, • • • ,2^ , p, p, ■ ■ ■ , p) for some integer 1 < j < T, and 



use equation (47), we obtain: 



■ {{dc - l)Ee-*^'') ''--^ 



p and t can be chosen to jointly minimize Ee '''^^ and Ee in the above, which along with 
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(46) results in 



where 7 = (4 - - and c = j'^^^'^^-^^ mmt>oEe'^^^ . If c < 1, then 



probabiUty of error tends to zero as stated m Theorem 5.2 
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